Skip to content

Comments

feat: MLPerf v3 Compliance#231

Open
hazemawadalla wants to merge 8 commits intomlcommons:mainfrom
hazemawadalla:main
Open

feat: MLPerf v3 Compliance#231
hazemawadalla wants to merge 8 commits intomlcommons:mainfrom
hazemawadalla:main

Conversation

@hazemawadalla
Copy link
Contributor

Summary

This PR brings the KV Cache Benchmark into alignment with MLPerf Storage v3.0 requirements, adds a comprehensive configuration system, and includes a full test suite.

Core Benchmark Engine (kv-cache.py)

  • Add ConfigLoader class with YAML config file support and schema validation
  • Add cfg() helper function for config-driven parameter access
  • Add validate_args() with safety limits for protected system paths
  • Rename all nvme_* metrics to storage_* for MLPerf terminology compliance
  • Add extended QoS percentiles: P99.9 and P99.99 latency tracking
  • Add per-tier bandwidth metrics (read/write GB/s per tier)
  • Add per-tier KV bytes tracking for detailed storage analysis
  • Fix GPU metadata desync bug via on_eviction_callback pattern
  • Change eviction from single-shot to iterative loop until space freed
  • Replace print statements with Python logging module
  • Add waterfall LRU eviction with configurable high/low watermarks
  • Add storage_health section with PASS/FAIL criteria
  • Add storage_throughput_tokens_per_sec as primary MLPerf metric

Wrapper Script (kv-cache-wrapper.sh)

  • Add -c DIR option for custom config directory
  • Generate and pass config.yaml to Python script via --config flag
  • Add --xlsx-output support for Excel export
  • Update jq queries for new storage_* metric names
  • Add mlperf_submission workload with required trial parameters
  • Enhance system detection for thread counts and memory limits
  • Update metric parsing for storage_throughput primary metric

Test Suite (test_kv_cache.py)

  • Add 170+ tests covering all new functionality
  • Add ConfigLoader tests: schema validation, defaults, file loading
  • Add cfg() helper tests for config-driven parameters
  • Add validate_args() tests for path safety and input validation
  • Add extended QoS tests for P99.9 and P99.99 percentiles
  • Add GPU eviction callback tests for metadata sync
  • Add per-tier bandwidth and KV bytes metric tests
  • Add storage_* metric naming tests for MLPerf compliance
  • Add waterfall eviction tests with high/low watermarks
  • Add storage_health PASS/FAIL criteria tests

Excel Export (json_to_xlsx.py)

  • Add P99.9 and P99.99 latency columns
  • Add per-tier KV bytes columns (GPU, CPU, Storage)
  • Add per-tier bandwidth columns (read/write GB/s)
  • Add storage tier device vs host latency breakdown
  • Rename nvme_entries to storage_entries for MLPerf compliance
  • Add storage_throughput_tokens_per_sec as primary metric

Configuration (config.yaml)

  • Add user_templates section with conversation patterns
  • Add qos_profiles with latency thresholds per tier
  • Add eviction settings with waterfall LRU parameters
  • Add storage_health criteria for PASS/FAIL determination
  • Add cache_sizing defaults for GPU/CPU/Storage tiers
  • Provides validated defaults for all tunable parameters

Documentation (README.md)

  • Add Configuration section with YAML parameter reference
  • Add MLPerf Submission Guidelines with validated commands
  • Add Excel metrics reference table with all output columns
  • Add installation instructions including pyyaml dependency
  • Add CLI arguments vs config file precedence documentation
  • Add workload definitions and tier configuration examples
  • Add troubleshooting section for common issues

Dependencies (requirements.txt)

  • Add pyyaml>=6.0 for YAML configuration file parsing

Test Plan

  • [x ] Run pytest on test_kv_cache.py (170+ tests passing)
  • [ x] Verify config.yaml loads correctly with default values
  • [x ] Run benchmark with --config flag and verify output metrics
  • [ x] Confirm storage_* metric naming in JSON output
  • [ x] Test Excel export with new columns

- Add ConfigLoader class with YAML config file support and schema validation
- Add cfg() helper function for config-driven parameter access
- Add validate_args() with safety limits for protected system paths
- Rename all nvme_* metrics to storage_* for MLPerf terminology compliance
- Add extended QoS percentiles: P99.9 and P99.99 latency tracking
- Add per-tier bandwidth metrics (read/write GB/s per tier)
- Add per-tier KV bytes tracking for detailed storage analysis
- Fix GPU metadata desync bug via on_eviction_callback pattern
- Change eviction from single-shot to iterative loop until space freed
- Replace print statements with Python logging module
- Add waterfall LRU eviction with configurable high/low watermarks
- Add storage_health section with PASS/FAIL criteria
- Add storage_throughput_tokens_per_sec as primary MLPerf metric
- Add -c DIR option for custom config directory
- Generate and pass config.yaml to Python script via --config flag
- Add --xlsx-output support for Excel export
- Update jq queries for new storage_* metric names
- Add mlperf_submission workload with required trial parameters
- Enhance system detection for thread counts and memory limits
- Update metric parsing for storage_throughput primary metric
- Add 170+ tests covering all new functionality
- Add ConfigLoader tests: schema validation, defaults, file loading
- Add cfg() helper tests for config-driven parameters
- Add validate_args() tests for path safety and input validation
- Add extended QoS tests for P99.9 and P99.99 percentiles
- Add GPU eviction callback tests for metadata sync
- Add per-tier bandwidth and KV bytes metric tests
- Add storage_* metric naming tests for MLPerf compliance
- Add waterfall eviction tests with high/low watermarks
- Add storage_health PASS/FAIL criteria tests
- Add Configuration section with YAML parameter reference
- Add MLPerf Submission Guidelines with validated commands
- Add Excel metrics reference table with all output columns
- Add installation instructions including pyyaml dependency
- Add CLI arguments vs config file precedence documentation
- Add workload definitions and tier configuration examples
- Add troubleshooting section for common issues
- Add kv-cache-test-report.html with full test execution results
- All 170+ tests passing for v3.0 features
- Create unit_test_results directory for test artifacts
- Add P99.9 and P99.99 latency columns
- Add per-tier KV bytes columns (GPU, CPU, Storage)
- Add per-tier bandwidth columns (read/write GB/s)
- Add storage tier device vs host latency breakdown
- Rename nvme_entries to storage_entries for MLPerf compliance
- Add storage_throughput_tokens_per_sec as primary metric
- Add pyyaml>=6.0 for YAML configuration file parsing
- Required for ConfigLoader and --config CLI argument
- Add user_templates section with conversation patterns
- Add qos_profiles with latency thresholds per tier
- Add eviction settings with waterfall LRU parameters
- Add storage_health criteria for PASS/FAIL determination
- Add cache_sizing defaults for GPU/CPU/Storage tiers
- Provides validated defaults for all tunable parameters
@hazemawadalla hazemawadalla requested a review from a team January 27, 2026 23:57
@hazemawadalla hazemawadalla requested a review from a team as a code owner January 27, 2026 23:57
@github-actions
Copy link

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant